suppressPackageStartupMessages(suppressWarnings({
library(dplyr)
library(tidyr)
library(ggplot2)
library(knitr)
library(cowplot)
}))
knitr::opts_chunk$set(message=FALSE, warning=FALSE, cache=TRUE, fig.width=10, fig.height=7)
The University of Bristol UoB plans to commence the 2020 academic year with a strategy of placing students in living circles, to minimise the spread of SARS-CoV-2 across the student body. To monitor outbreaks it would be costly to simply test everybody on a regular basis, and so whether pooling samples could effectively reduce costs and still adequately detect outbreaks is an open question.
These simulations seek to explore pooling strategies that aim to
Distribution of students per living circle:
load("../data/circles.rdata")
hist(ids$count, xlab="Number of students per circle", breaks=40, main="")
Our objective is to simulate the RT-PCR test for presence of SARS-Cov-2 in samples from all individuals under different disease transmission scenarios. There are four main components to the model – viral load sampling, disease transmission between students, pooling allocation of collected samples, and testing performance. We compare different aspects of testing performance across three strategies – per-individual testing, pooled testing, or pooled testing with per-individual follow-up in positive pools.
At time step 0, a random set of individuals is selected to be infected according to some initial prevalence P(0). The number of people each individual goes on to infect is set as
\[ R_i\sim Poisson\left(R\right) \]
We allow the \(R\) value to vary across simulation scenarios.
high = within circle = 0.9, within location = 0.09, anywhere else = 0.01medium = within circle = 0.6, within location = 0.3, anywhere else = 0.1low = within circle = 0.34, within location = 0.33, anywhere = 0.33Based on analysis from Adam Finn and Amy Thomas
The RT-PCR testing process involves an exponential growth phase of viral RNA, such that after a number of replication cycles n, the viral load
\[ V_{i,n}=\frac{\sum_{D} V_{i,0}}{D}\left(1+E\right)^n \]
where \(E\) is the efficiency of the reaction, \(\sum_{D} V_{i,0}\) is the starting viral load in the sample, and \(D\) is the number samples in the testing pool. In the case of the individual test, \(D=1\) and the \(V_{i,0}\) is the viral load of the individual. We assume that the efficiency lies between 0.65 and 0.9, such as
\[ E\sim Beta\left(\alpha,\beta\right)\times\left(0.9-0.65\right)+0.65 \]
In order to test positive, the RT-PCR test must detect a viral load of some fluorescence threshold corresponding to a some number of RNA copies \(V_{C_T}\) after some number of cycles \(n=C_t\). We can write down the \(V_{C_t}\) value given \(C_t\), distribution of \(V_{i,0}\) values, dilution factor \(D\), parameter values for the distribution of efficiencies \(E\), and the expected test sensitivity as a quantile function
\[ V_{C_t}=Q\left(D,R_0,C_t,\alpha,\beta,{1-T}_{se}\right) \]
A distribution of \(V_{i,0}\) values can be obtained from the viral load model. Through experimental calibration we expect that the average sensitivity of the test is \(T_{se}=0.98\) when \(D=1\), falling to 0.8 when \(D=10\) using \(C_t=35\) cycles. Hence, we must identify a set of \(\alpha\) and \(\beta\) parameters that determine the test efficiency distribution that satisfies the calibrated test sensitivities given the assumed distribution of viral loads in the samples. To achieve this we use a general optimisation function, where we want to find an \(V_{C_t}\) threshold that is estimated to be identical for the two dilution scenarios:
\[ \underset{\alpha, \beta}{\arg \min} \left ( Q \left(D=1,R_0,C_t=35,\alpha,\beta,T_{se}=0.02\right)-Q\left(D=10,R_0,C_t=35,\alpha, \beta,T_{se}=0.2\right) \right)^2 \]
We used the optim function in R 4.0.2 to solve this optimisation function, obtaining parameter values of \(\alpha=\ 2.60\) and \(\beta=1.33\).
See the ct.html document (ct.rmd) on how this is done.
Was previously not allowing any heterogeneity between Ct values using the following - now deprecated
Assume 98% of \(TPR=0.98\) and specificity of \(TNR = 1\) for a single sample (not pooled / diluted). By pooling samples, the dilution is assumed to be the proportion positive divided by the proportion in the pool. In order to simulate this appropriately we need a function that will relate sensitivity to dilution fraction.
Use a sigmoid function that allows
\[ TPR_{d} = -\frac{1}{1 + e^{-d/4 + 4}} + 1 \]
where \(d\) is the dilution factor, calculated as the total number of people in the pool divided by the number of infected people in the pool. Ideally would get some empirical results on this (e.g. see this paper that suggests 8x dilution doesn’t affect the assay https://linkinghub.elsevier.com/retrieve/pii/S1198743X20303499).
In the latter we use a bin packing algorithm to minimise distribution of living circles across test pools
To maximise the efficacy of pooling, assume that more transmission will occur within circles, and so pooling based on circles should reduce the dilution factor. For the simulations, determine a set assay pool size and allocate individuals to pools as follows.
Once the pools are determined, in order to report on whether circles are infected need to reassemble the circles from the pools and declare if a particular circle has any infected based on whether any of its constituent pools have been found positive.
The simulations also follow up positive pools by individually testing everyone within that pool.
Define outbreak as 2 people in a location being infected. This is detected either from
Note: this needs to be improved in the model
load("../results/sim.rdata")
Use the following costs:
res$nstudent <- res$reagentuse.ind_tests
Cost difference in pooling vs pooling with followup
assay_ppv <- function(sensitivity, prevalence, specificity)
{
sensitivity * prevalence / (sensitivity * prevalence + (1-specificity) * (1-prevalence))
}
ress <- res %>%
group_by(prevalence, spread, containment, pool_size, random_pooling) %>%
summarise(
cost.ind_tests = mean(cost.ind_tests, na.rm=TRUE),
cost.pool_tests = mean(cost.pool_tests, na.rm=TRUE),
cost.poolfollowup_tests = mean(cost.poolfollowup_tests, na.rm=TRUE),
reagentuse.ind_tests = mean(reagentuse.ind_tests, na.rm=TRUE),
reagentuse.pool_tests = mean(reagentuse.pool_tests, na.rm=TRUE),
reagentuse.poolfollowup_tests = mean(reagentuse.poolfollowup_tests, na.rm=TRUE),
sensitivity.ind_tests = mean(sensitivity.ind_tests1, na.rm=TRUE),
sensitivity.pool_tests = mean(sensitivity.pool_tests1, na.rm=TRUE),
sensitivity.poolfollowup_tests = mean(sensitivity.poolfollowup_tests1, na.rm=TRUE),
specificity.ind_tests = mean(specificity.ind_tests1, na.rm=TRUE),
specificity.pool_tests = mean(specificity.pool_tests1, na.rm=TRUE),
specificity.poolfollowup_tests = mean(specificity.poolfollowup_tests1, na.rm=TRUE),
sensitivity.outbreak.ind_tests = mean(sensitivity.outbreak.ind_tests, na.rm=TRUE),
sensitivity.outbreak.pool_outbreak_detected = mean(sensitivity.outbreak.pool_outbreak_detected[is.finite(sensitivity.outbreak.pool_outbreak_detected)], na.rm=TRUE),
sensitivity.outbreak.poolfollowup_outbreak_detected = mean(sensitivity.outbreak.poolfollowup_outbreak_detected[is.finite(sensitivity.outbreak.poolfollowup_outbreak_detected)], na.rm=TRUE),
specificity.outbreak.ind_tests = mean(specificity.outbreak.ind_tests[is.finite(specificity.outbreak.ind_tests)], na.rm=TRUE),
specificity.outbreak.pool_outbreak_detected = mean(specificity.outbreak.pool_outbreak_detected[is.finite(specificity.outbreak.pool_outbreak_detected)], na.rm=TRUE),
specificity.outbreak.poolfollowup_outbreak_detected = mean(specificity.outbreak.poolfollowup_outbreak_detected[is.finite(specificity.outbreak.poolfollowup_outbreak_detected)], na.rm=TRUE),
true_prevalence=mean(true_prevalence),
prevalence.ind_tests = mean(prevalence.ind_tests1, na.rm=TRUE),
prevalence.pool_tests = mean(prevalence.pool_tests1, na.rm=TRUE),
prevalence.poolfollowup_tests = mean(prevalence.poolfollowup_tests1, na.rm=TRUE),
efficiency.ind_tests = mean(sensitivity.ind_tests / cost.ind_tests),
efficiency.pool_tests = mean(sensitivity.pool_tests / cost.pool_tests),
efficiency.poolfollowup_tests = mean(sensitivity.poolfollowup_tests / cost.poolfollowup_tests),
ppv.ind_tests = assay_ppv(sensitivity.ind_tests, true_prevalence, specificity.ind_tests),
ppv.pool_tests = assay_ppv(sensitivity.pool_tests, true_prevalence, specificity.pool_tests),
ppv.poolfollowup_tests = assay_ppv(sensitivity.poolfollowup_tests, true_prevalence, specificity.poolfollowup_tests),
efficiency2.ind_tests = mean(ppv.ind_tests / cost.ind_tests),
efficiency2.pool_tests = mean(ppv.pool_tests / cost.pool_tests),
efficiency2.poolfollowup_tests = mean(ppv.poolfollowup_tests / cost.poolfollowup_tests),
efficiency3.ind_tests = mean(cost.ind_tests / ppv.ind_tests),
efficiency3.pool_tests = mean(cost.pool_tests / ppv.pool_tests),
efficiency3.poolfollowup_tests = mean(cost.poolfollowup_tests / ppv.poolfollowup_tests),
sensitivity.ind_lfd = mean(sensitivity.lfd1, na.rm=TRUE),
sensitivity.ind_lfd2 = mean(sensitivity.lfd_ever, na.rm=TRUE),
specificity.ind_lfd = mean(specificity.lfd1, na.rm=TRUE),
specificity.ind_lfd2 = mean(specificity.lfd_ever, na.rm=TRUE),
reagentuse.ind_lfd = 0,
reagentuse.ind_lfd2 = 0,
ppv.ind_lfd = assay_ppv(sensitivity.ind_lfd, true_prevalence, specificity.ind_lfd),
ppv.ind_lfd2 = assay_ppv(sensitivity.ind_lfd2, true_prevalence, specificity.ind_lfd2),
cost.ind_lfd = reagentuse.ind_tests * 5,
cost.ind_lfd2 = reagentuse.ind_tests * 10,
prevalence.ind_lfd = sensitivity.ind_lfd * true_prevalence + (1-specificity.ind_lfd),
prevalence.ind_lfd2 = sensitivity.ind_lfd2 * true_prevalence + (1-specificity.ind_lfd2),
efficiency.ind_lfd = mean(sensitivity.ind_lfd / cost.ind_lfd),
efficiency.ind_lfd2 = mean(sensitivity.ind_lfd2 / cost.ind_lfd2),
efficiency2.ind_lfd = mean(ppv.ind_lfd / cost.ind_lfd),
efficiency2.ind_lfd2 = mean(ppv.ind_lfd2 / cost.ind_lfd2),
efficiency3.ind_lfd = mean(cost.ind_lfd / ppv.ind_lfd),
efficiency3.ind_lfd2 = mean(cost.ind_lfd2 / ppv.ind_lfd2)
)
ress$group <- paste(ress$prevalence, ress$spread, ress$containment, ress$random_pooling)
temp <- group_by(ress, prevalence, spread, containment) %>%
summarise(
pool_size=1,
random_pooling=FALSE,
cost.ind_tests = mean(cost.ind_tests, na.rm=TRUE),
cost.ind_lfd = mean(cost.ind_lfd, na.rm=TRUE),
cost.ind_lfd2 = mean(cost.ind_lfd2, na.rm=TRUE),
reagentuse.ind_tests = mean(reagentuse.ind_tests, na.rm=TRUE),
reagentuse.ind_lfd = mean(reagentuse.ind_lfd, na.rm=TRUE),
reagentuse.ind_lfd2 = mean(reagentuse.ind_lfd2, na.rm=TRUE),
sensitivity.ind_tests = mean(sensitivity.ind_tests, na.rm=TRUE),
sensitivity.ind_lfd = mean(sensitivity.ind_lfd, na.rm=TRUE),
sensitivity.ind_lfd2 = mean(sensitivity.ind_lfd2, na.rm=TRUE),
specificity.ind_tests = mean(specificity.ind_tests, na.rm=TRUE),
specificity.ind_lfd = mean(specificity.ind_lfd, na.rm=TRUE),
specificity.ind_lfd2 = mean(specificity.ind_lfd2, na.rm=TRUE),
prevalence.ind_tests = mean(prevalence.ind_tests, na.rm=TRUE),
prevalence.ind_lfd = mean(prevalence.ind_lfd, na.rm=TRUE),
prevalence.ind_lfd2 = mean(prevalence.ind_lfd2, na.rm=TRUE),
efficiency.ind_tests = mean(efficiency.ind_tests, na.rm=TRUE),
efficiency.ind_lfd = mean(efficiency.ind_lfd, na.rm=TRUE),
efficiency.ind_lfd2 = mean(efficiency.ind_lfd2, na.rm=TRUE),
efficiency2.ind_tests = mean(efficiency2.ind_tests, na.rm=TRUE),
efficiency2.ind_lfd = mean(efficiency2.ind_lfd, na.rm=TRUE),
efficiency2.ind_lfd2 = mean(efficiency2.ind_lfd2, na.rm=TRUE),
efficiency3.ind_tests = mean(efficiency3.ind_tests, na.rm=TRUE),
efficiency3.ind_lfd = mean(efficiency3.ind_lfd, na.rm=TRUE),
efficiency3.ind_lfd2 = mean(efficiency3.ind_lfd2, na.rm=TRUE),
ppv.ind_tests = mean(ppv.ind_tests, na.rm=TRUE),
ppv.ind_lfd = mean(ppv.ind_lfd, na.rm=TRUE),
ppv.ind_lfd2 = mean(ppv.ind_lfd2, na.rm=TRUE),
true_prevalence = mean(true_prevalence, na.rm=TRUE)
)
ress2 <- rbind(ress, temp)
ress2$pooling_type <- "Clustered"
ress2$pooling_type[ress2$random_pooling] <- "Random"
rename_labels <- function(x)
{
strsplit(x, split="\\.") %>%
lapply(., function(y)
{
y <- y[2]
y[y=="ind_tests"] <- "Individual PCR"
y[y=="ind_lfd"] <- "Individual LFD"
y[y=="ind_lfd2"] <- "Individual LFD x2"
y[y=="pool_tests"] <- "Pooled PCR"
y[y=="poolfollowup_tests"] <- "Pooled PCR + followup"
y
}) %>% unlist
}
What is the extra cost incurred by following up on positive pools to get infected individuals (R0 (cols) vs baseline prevalence (rows))?
tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(cost.ind_tests, cost.pool_tests, cost.poolfollowup_tests, cost.ind_lfd, cost.ind_lfd2, group, containment, pool_size, prevalence, spread, pooling_type) %>%
tidyr::gather(key="key", value="cost", cost.pool_tests, cost.poolfollowup_tests, cost.ind_tests, cost.ind_lfd, cost.ind_lfd2) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
mutate(
key=paste0(rename_labels(key), " (", pooling_type, ")"),
spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
arrange(pool_size)
tt$key[grepl("Individual", tt$key)] <- gsub(" \\(Clustered)", "", tt$key[grepl("Individual", tt$key)])
ggplot(subset(tt, pool_size != 1), aes(x=pool_size, y=cost)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
geom_hline(data=subset(tt, grepl("Indiv", key)), aes(yintercept=cost, linetype=key)) +
facet_grid(prevalence ~ spread) +
scale_fill_brewer(type="qual", palette=2) +
theme_bw() +
scale_x_continuous(breaks=subset(tt, pool_size != 1) %>% {unique(.$pool_size)}) +
labs(x="Pool size", linetype="", colour="")
Another visualisation of the same thing, showing that impact of clustering isn’t very high
ress %>% mutate(spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
ggplot(., aes(x=cost.pool_tests, y=cost.poolfollowup_tests, group=group)) +
geom_point(aes(colour=containment, size=pool_size, shape=random_pooling)) +
geom_line(aes(colour=containment, linetype=random_pooling)) +
geom_abline(slope=1) +
facet_grid(prevalence ~ spread) +
scale_colour_brewer() +
theme_bw() +
theme(axis.text.x=element_text(angle=90, hjust=1, vjust=0.5)) +
labs(x="Cost of pooled PCR", y="Cost of pooled PCR with followup")
Followup tests are relatively more expensive when the spread is higher and the prevalence is higher. Similar for reagent use:
tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(reagentuse.ind_tests, reagentuse.pool_tests, reagentuse.poolfollowup_tests, reagentuse.ind_lfd, reagentuse.ind_tests, group, containment, pool_size, prevalence, spread, pooling_type) %>%
tidyr::gather(key="key", value="reagentuse", reagentuse.pool_tests, reagentuse.poolfollowup_tests, reagentuse.ind_tests, reagentuse.ind_lfd) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
mutate(
key=paste0(rename_labels(key), " (", pooling_type, ")"),
spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
arrange(pool_size)
tt$key[tt$key == "Individual PCR (Clustered)"] <- "Individual PCR"
tt$key[tt$key == "Individual LFD (Clustered)"] <- "Individual LFD"
tt2 <- subset(tt, pool_size == 2)
tt2$pool_size <- 1
tt2$reagentuse <- subset(tt, key == "Individual PCR")$reagentuse[1]
tt <- rbind(subset(tt, pool_size != 1), tt2)
ggplot(tt, aes(x=pool_size, y=reagentuse)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
facet_grid(prevalence ~ spread) +
scale_fill_brewer(type="qual", palette=2) +
theme_bw() +
scale_x_continuous(breaks=unique(tt$pool_size)) +
labs(x="Pool size", linetype="", colour="", y="Number of RT-qPCR tests")
Example
x <- filter(tt, containment == "conquest", pool_size==20, prevalence == "prv = 0.05", spread == "R = 3")
1-x$reagentuse[3]/x$reagentuse[4]
## [1] 0.1499062
Sensitivity of pool vs pool+followup is basically the same, but both drop sharply with larger pool size given current dilution curves (R0 (cols) vs baseline prevalence (rows)):
tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(sensitivity.ind_tests, sensitivity.pool_tests, sensitivity.poolfollowup_tests, sensitivity.ind_lfd, sensitivity.ind_lfd2, group, containment, pool_size, prevalence, spread, pooling_type) %>%
tidyr::gather(key="key", value="sensitivity", sensitivity.pool_tests, sensitivity.poolfollowup_tests, sensitivity.ind_tests, sensitivity.ind_lfd, sensitivity.ind_lfd2) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
mutate(
key=paste0(rename_labels(key), " (", pooling_type, ")"),
spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
arrange(pool_size)
tt$key[grepl("Individual", tt$key)] <- gsub(" \\(Clustered)", "", tt$key[grepl("Individual", tt$key)])
ggplot(subset(tt, pool_size != 1), aes(x=pool_size, y=sensitivity)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
geom_hline(data=subset(tt, grepl("Indiv", key)), aes(yintercept=sensitivity, linetype=key)) +
facet_grid(prevalence ~ spread) +
scale_fill_brewer(type="qual", palette=2) +
theme_bw() +
scale_x_continuous(breaks=subset(tt, pool_size != 1) %>% {unique(.$pool_size)}) +
labs(x="Pool size", linetype="", colour="")
Specificity
tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(specificity.ind_tests, specificity.pool_tests, specificity.poolfollowup_tests, specificity.ind_lfd, specificity.ind_tests, specificity.ind_lfd2, group, containment, pool_size, prevalence, spread, pooling_type) %>%
tidyr::gather(key="key", value="specificity", specificity.pool_tests, specificity.poolfollowup_tests, specificity.ind_tests, specificity.ind_lfd, specificity.ind_lfd2) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
mutate(
key=paste0(rename_labels(key), " (", pooling_type, ")"),
spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
arrange(pool_size)
tt$key[grepl("Individual", tt$key)] <- gsub(" \\(Clustered)", "", tt$key[grepl("Individual", tt$key)])
ggplot(subset(tt, pool_size != 1), aes(x=pool_size, y=1-specificity)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
geom_hline(data=subset(tt, grepl("Indiv", key)), aes(yintercept=1-specificity, linetype=key)) +
facet_grid(prevalence ~ spread) +
scale_fill_brewer(type="qual", palette=2) +
# theme_bw() +
scale_x_continuous(breaks=subset(tt, pool_size != 1) %>% {unique(.$pool_size)}) +
scale_y_log10() +
labs(x="Pool size", linetype="", colour="", y="False positive rate (log10 scale)")
How many unnecessary quarantines if we quarantine whole circles when a positive pool comes back (R0 (cols) vs baseline prevalence (rows)):
ggplot(ress, aes(y=specificity.pool_tests, x=pool_size, group=group)) +
geom_point(aes(colour=containment, shape=random_pooling)) +
geom_line(aes(colour=containment, linetype=random_pooling)) +
facet_grid(prevalence ~ spread) +
scale_colour_brewer() +
labs(x="Pool size", y="Specificity of pool tests", colour="Containment", shape="Random pooling", linetype="Random pooling")
ggplot(ress, aes(y=specificity.poolfollowup_tests, x=pool_size, group=group)) +
geom_point(aes(colour=containment, shape=random_pooling)) +
geom_line(aes(colour=containment, linetype=random_pooling)) +
facet_grid(prevalence ~ spread) +
scale_colour_brewer() +
labs(x="Pool size", y="Specificity of pool+followup tests", colour="Containment", shape="Random pooling", linetype="Random pooling")
ggplot(ress, aes(y=sensitivity.pool_tests, x=pool_size, group=group)) +
geom_point(aes(colour=containment, shape=random_pooling)) +
geom_line(aes(colour=containment, linetype=random_pooling)) +
facet_grid(prevalence ~ spread) +
scale_colour_brewer() +
labs(x="Pool size", y="Sensitivity of pool tests", colour="Containment", shape="Random pooling", linetype="Random pooling")
ggplot(ress, aes(y=sensitivity.poolfollowup_tests, x=pool_size, group=group)) +
geom_point(aes(colour=containment, shape=random_pooling)) +
geom_line(aes(colour=containment, linetype=random_pooling)) +
facet_grid(prevalence ~ spread) +
scale_colour_brewer() +
labs(x="Pool size", y="Sensitivity of pool+followup", colour="Containment", shape="Random pooling", linetype="Random pooling") +
theme_minimal_grid()
Positive predictive value
tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(ppv.ind_tests, ppv.pool_tests, ppv.poolfollowup_tests, ppv.ind_lfd, ppv.ind_lfd2, group, containment, pool_size, prevalence, spread, pooling_type) %>%
tidyr::gather(key="key", value="ppv", ppv.pool_tests, ppv.poolfollowup_tests, ppv.ind_tests, ppv.ind_lfd, ppv.ind_lfd2) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
mutate(
key=paste0(rename_labels(key), " (", pooling_type, ")"),
spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
arrange(pool_size)
tt$key[grepl("Individual", tt$key)] <- gsub(" \\(Clustered)", "", tt$key[grepl("Individual", tt$key)])
ggplot(subset(tt, pool_size != 1), aes(x=pool_size, y=ppv)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
geom_hline(data=subset(tt, grepl("Indiv", key)), aes(yintercept=ppv, linetype=key)) +
facet_grid(prevalence ~ spread) +
scale_fill_brewer(type="qual", palette=2) +
theme_bw() +
scale_x_continuous(breaks=subset(tt, pool_size != 1) %>% {unique(.$pool_size)}) +
labs(x="Pool size", linetype="", colour="", y="Positive predictive value")
Estimating prevalence
tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(prevalence.ind_tests, prevalence.ind_lfd, prevalence.ind_lfd2, prevalence.pool_tests, prevalence.poolfollowup_tests, group, containment, pool_size, pooling_type, prevalence, spread, true_prevalence) %>%
tidyr::gather(key="key", value="est_prevalence", prevalence.pool_tests, prevalence.poolfollowup_tests, prevalence.ind_tests, prevalence.ind_lfd, prevalence.ind_lfd2) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
subset(., pool_size %in% c(1,2,5,10,20,30)) %>%
mutate(key=paste0(rename_labels(key), " (", pooling_type, ")"))
tt$key[grepl("Individual", tt$key)] <- gsub(" \\(Clustered)", "", tt$key[grepl("Individual", tt$key)])
ggplot(tt, aes(x=true_prevalence, y=est_prevalence)) +
geom_line(aes(colour=as.factor(pool_size), linetype=as.factor(prevalence))) +
geom_point(aes(colour=as.factor(pool_size), shape=as.factor(prevalence))) +
facet_wrap(~ key, ncol=4, scale="free") +
geom_abline(slope=1) +
scale_colour_brewer(type="qual") +
theme_bw() +
labs(x="True prevalence", y="Estimated prevalence", colour="Pool size", linetype="Initial prevalence", shape="Initial prevalence")
Efficiency - sensitivity vs cost ratios
tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(efficiency.ind_tests, efficiency.pool_tests, efficiency.poolfollowup_tests, efficiency.ind_lfd, efficiency.ind_lfd2, group, containment, pool_size, prevalence, spread, pooling_type) %>%
tidyr::gather(key="key", value="efficiency", efficiency.pool_tests, efficiency.poolfollowup_tests, efficiency.ind_tests, efficiency.ind_lfd, efficiency.ind_lfd2) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
mutate(
key=paste0(rename_labels(key), " (", pooling_type, ")"),
spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
arrange(pool_size)
tt$key[grepl("Individual", tt$key)] <- gsub(" \\(Clustered)", "", tt$key[grepl("Individual", tt$key)])
ggplot(subset(tt, pool_size != 1), aes(x=pool_size, y=efficiency)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
geom_hline(data=subset(tt, grepl("Indiv", key)), aes(yintercept=efficiency, linetype=key)) +
facet_grid(prevalence ~ spread) +
scale_fill_brewer(type="qual", palette=2) +
theme_bw() +
scale_x_continuous(breaks=subset(tt, pool_size != 1) %>% {unique(.$pool_size)}) +
labs(x="Pool size", linetype="", colour="", y="Sensitivity / cost")
tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(efficiency2.ind_tests, efficiency2.pool_tests, efficiency2.poolfollowup_tests, efficiency2.ind_lfd, efficiency2.ind_lfd2, group, containment, pool_size, prevalence, spread, pooling_type) %>%
tidyr::gather(key="key", value="efficiency2", efficiency2.pool_tests, efficiency2.poolfollowup_tests, efficiency2.ind_tests, efficiency2.ind_lfd, efficiency2.ind_lfd2) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
mutate(
key=paste0(rename_labels(key), " (", pooling_type, ")"),
spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
arrange(pool_size)
tt$key[grepl("Individual", tt$key)] <- gsub(" \\(Clustered)", "", tt$key[grepl("Individual", tt$key)])
ggplot(subset(tt, pool_size != 1), aes(x=pool_size, y=efficiency2)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
geom_hline(data=subset(tt, grepl("Indiv", key)), aes(yintercept=efficiency2, linetype=key)) +
facet_grid(prevalence ~ spread) +
scale_fill_brewer(type="qual", palette=2) +
theme_bw() +
scale_x_continuous(breaks=subset(tt, pool_size != 1) %>% {unique(.$pool_size)}) +
labs(x="Pool size", linetype="", colour="", y="Positive predictive value / cost")
Cost per ppv
tt <- ress2 %>% subset(., containment == "conquest") %>%
dplyr::select(efficiency3.ind_tests, efficiency3.pool_tests, efficiency3.poolfollowup_tests, efficiency3.ind_lfd, efficiency3.ind_lfd2, group, containment, pool_size, prevalence, spread, pooling_type) %>%
tidyr::gather(key="key", value="efficiency3", efficiency3.pool_tests, efficiency3.poolfollowup_tests, efficiency3.ind_tests, efficiency3.ind_lfd, efficiency3.ind_lfd2) %>%
subset(., grepl("pool", key) | grepl("ind", key) & pool_size == 1) %>%
mutate(
key=paste0(rename_labels(key), " (", pooling_type, ")"),
spread=paste("R =", spread), prevalence=paste("prv =", prevalence)) %>%
arrange(pool_size)
tt$key[grepl("Individual", tt$key)] <- gsub(" \\(Clustered)", "", tt$key[grepl("Individual", tt$key)])
ggplot(subset(tt, pool_size != 1), aes(x=pool_size, y=efficiency3)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
geom_hline(data=subset(tt, grepl("Indiv", key)), aes(yintercept=efficiency3, linetype=key)) +
facet_grid(prevalence ~ spread) +
scale_fill_brewer(type="qual", palette=2) +
theme_bw() +
scale_x_continuous(breaks=subset(tt, pool_size != 1) %>% {unique(.$pool_size)}) +
labs(x="Pool size", linetype="", colour="", y="Cost / positive predictive value (lower is better)")
## Need to know
If a pool is detected positive, is it possible to avoid having to test everyone in the pool to identify the infected individuals?
Group testing theory - when prevalence is low then pooling can be more efficient, basically just look for positive results in a pool and then follow up everyone in the pool. To maximise the efficiency of the pooling need to have a sense of the prevalence.
This paper uses tags along with pooling: https://advances.sciencemag.org/content/6/37/eabc5961
This is an R package for group testing: https://journal.r-project.org/archive/2010/RJ-2010-016/RJ-2010-016.pdf
https://linkinghub.elsevier.com/retrieve/pii/S1198743X20303499
Actually the extra costs probably don’t matter too much esp when prevalence low